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Abstract 



We present a generic computational framework for the simulation of viral capsid 
assembly which is quantitative and specific. Starting from PDB files containing 
atomic coordinates, the algorithm builds a coarse grained description of protein 
oligomers based on graph rigidity. These reduced protein descriptions are used 
in an extended Gillespie algorithm to investigate the stochastic kinetics of the 
assembly process. The association rates are obtained from a diffusive Smolu- 
chowski equation for rapid coagulation, modified to account for water shielding 
and protein structure. The dissociation rates are derived by interpreting the 
splitting of oligomers as a process of graph partitioning akin to the escape from 
a multidimensional well. This modular framework is quantitative yet computa- 
tionally tractable, with a small number of physically motivated parameters. The 
methodology is illustrated using two different viruses which are shown to follow 
quantitatively different assembly pathways. We also show how in this model 
the quasi-stationary kinetics of assembly can be described as a Markovian cas- 
cading process in which only a few intermediates and a small proportion of 
pathways are present. The observed pathways and intermediates can be related 
a posteriori to structural and energetic properties of the capsid oligomers. 

Key words: Virus self-assembly; Gillespie algorithm; Protein models; Stochas- 
tic processes; Biophysical modelling. 
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Introduction 

Viruses are the cause of some of the deadliest diseases today. In fact, the lethal- 
ity of viruses emanates from their simplicity; as the ultimate non-autonomous 
parasites, viruses cannot replicate without a host cell and are therefore immune 
to standard anti-bacterial drugs. Basically, a virus consists of two components: 
genetic material (DNA or RNA) and a protective protein shell, the capsid. In 
a self-referencing loop, the viral nucleic acids encode the proteins that form the 
viral capsid. Once the virus penetrates a host cell, it hijacks the cellular ma- 
chinery of the host and uses it to replicate the viral genome and to express the 
viral protein(s), which then assemble into capsids. As a result, the infected cell 
acts as a replicator of new viruses instead of performing its normal tasks (0). 

Another remarkable feature of viruses is that capsids are commonly quasi- 
spherical with icosahedral symmetry <§jj, |2j). Although other viral structures, 
such as cigar shaped and partial sheets, are possible, we restrict our investiga- 
tion to icosahedral capsids. Because encoding a large protein to envelop the 
whole genome is not physically realizable, identical copies of the same protein 
are used in a symmetric arrangement. Therefore, symmetry is used to econo- 
mize the number of distinct proteins encoded in the viral genomes. This was 
formalized beautifully in the classic theory of quasi- equivalence 

Si, 

which 

broadly predicts the manner in which identical asymmetric protein units can be 
used to form a symmetric capsid. Quasi-equivalent viruses are characterized by 
their T-number, the number of proteins in each asymmetric unit (Fig.^). This 
leads to icosahedral capsids with 60 T proteins, where geometrical constraints 
dictate that T = h 2 + hk + k 2 , with h and k non-negative integers. Clearly, 
viral capsids with larger T-values enclose a larger volume while maintaining 
icosahedral symmetry. 

The assembly of the capsid, a crucial step in the virus life cycle, could provide 
an opportunity to interfere with the process of virus replication (0). However, 
although there is a wealth of structural capsid data from X-ray crystallography 
and cryo-electron microscopy, the assembly pathways remain largely uncharted. 
It is known that inside the cell the capsid is assembled around the virus genome 
(DNA or RNA) with only limited or no assistance from other bio- molecules (0). 
Even more remarkable, for some viruses self-assembly can take place in vitro 
, in the absence of the genome and outside the cellular environment, and still 
lead to stable capsids that are indistinguishable from those created in vivo. The 
role of the genome in the assembly process is not fully clarified and it may well 
be that in vivo and in vitro assemblies follow different routes (0). 

Because detailed experimental data on assembly routes is at present difficult 
to obtain (03, E3, 0> 03); modelling and simulation approaches have come to 
play an important role in the understanding of this process. In particular, 
one would like to identify the pathways by which the oligomers combine to 
form the final capsid and the factors that can influence the process. Previous 
theoretical work has approached different aspects of the assembly process using 
a variety of techniques: from dynamic to static models, both microscopic and 
macroscopic (0, H E E E E Q E E E EH 
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Ideally, a fully dynamic view of the assembly process could be achieved by 
performing molecular dynamics (MD) simulations with a full atomic description 
of the proteins in aqueous solution. However, the computational cost of full MD 
restricts its applicability to simplistic models of proteins — essent ially , balls with 
sticky pods under Brownian motion. Schwartz and Berger (0, ll(t > performed 
such a dynamical simulation of capsid formation, where they showed that the 
assembly can be com plet ed using only local information in the incomplete capsid. 
Recently, Rapaport Ijllh has presented more realistic MD results that capture 
some of the salient features of a generic virus self-assembly process but still 
lacking the necessary detail to investigate specific viruses. 

Quantitative results for specific viruses can only be obtained through the use 
of a more detailed protein model. However, it is currently infeasible to simulate 
explicit dynamics of such a large ensemble of hydrated proteins due to the size 
and complexity of the units. This has led to microscopic approaches in which the 
partially completed capsid is investigated as it is assembled quasi- statically. The 
assumption here is that the relative positions (and thus, the interactions and 
energies) in the incomplete capsid are identical to those found in the complete 
capsid. However, this is itself a computationally hard problem due to the com- 
binatorial number of assembly pathways. Horton and Lewis l|l2h were the first 
to use combinatorial optimization to find substructures with the most favorable 
association energies. This scheme was further extended by Reddy et al. ifisl) 
with a more refined method for calculating the energies. Beyond purely ener- 
getic considerations, structural concepts have been used to characterize protein 
assemblies: Sitharam and Agbandje-McKenna ijlih have used combinatorial and 
computational algebra to create models based on static geometric and tenscg- 
rity constraints, while Hespenheide et al. ijlfJ ) have investigated rigid protein 
assemblies as likely candidates to be long-lived. 

Alternatively, other theoretical studies have concentrated on more macro- 
scopic approaches. Some studies have focused on the static mechanical struc- 
ture of the full capsid rather than the dynamics of the assembly 
Recent work of Bruinsma et al. (HUE! (see also (EJ for more qualitative ideas) 
is based on statistical mechanics calculations of free energies that take into ac- 
count the curvature of the capsid. Finally, the macroscopic kinetic approach 
pursued by the group led by Zlotnick (|J, Il6l [H GH (see also (03)) describes 
capsid assembly through empirical, law of mass action differential equations for 
the concentration of the different oligomers. However, although the results can 
be related to bulk concentration measurements, this kinetic approach is still 
unable to provide information about microscopic pathways. In recent work, En- 
dres et al. |25h have concluded that only a few out of the combinatorially many 
intermediates play any role and that these cannot be predicted by considering 
minimal energy configurations alone. 

In this paper, we develop a modelling framework that incorporates atomic 
detail of proteins into an explicit implementation of the kinetics of capsid as- 
sembly as a stochastic process. Our model starts from atomic descriptions of 
the protein oligomers, available from databases such as VIPER (26), and sim- 
plifies the representation through a reduction of the degrees of freedom based 
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on graph rigidity measures with the aid of the software FIRST l|27h . These 
reduced oligomer descriptions are used to simulate stochastically the process of 
capsid formation, without allowing for malformed structures, through an ex- 
tended Gillespie algorithm l|28h . Our scheme includes both diffusive association 
and dissociation reactions whose reaction rates are derived using the reduced 
graph representations. Although our algorithm does not implement dynamics 
explicitly, it provides the stochastic time evolution of the system and the quasi- 
steady oligomer distribution. This information can be analyzed to infer which 
pathways are important in the assembly of specific viruses and the role that 
protein structure and chemical environment play in the assembly process. 



Reduced protein descriptions from full atomic mod- 
els 

In order to incorporate sufficient molecular detail, our computational framework 
starts from the detailed atomic structure of proteins as determined by crystal- 
lographic experiments. An invaluable resource is the database VIPER |26l) . 
which provides protein structures, transformation matrices, maps for adjacent 
proteins and binding energies for a large number of viruses. This full atom 
description of the protein oligomers needs to be simplified to make it tractable 
for computational purposes. The basic physical idea underlying our simplified 
protein model is the assumption that rigid substructures will effectively move 
as a block. This implies a reduction in the number of degrees of freedom and, 
consequently, in the effective size of the problem. 

The initial step is the addition of hydrogen atoms to the PDB structure us- 
ing the software WHAT IF (29). We then characterize the full atom structure 
of each oligomer with FIRST, a computational tool for the analysis of proteins 
developed by Jacobs et al. FIRST uses standard potentials to identify cova- 
lent and hydrogen bonds, salt bridges and hydrophobic tethers in the structure, 
and represents the protein as a bond bending network. This graph representa- 
tion, where nodes are atoms and edges indicate constraints introduced by bonds, 
is then analyzed with a computationally efficient algorithm (the pebble game) 
to identify flexible (underconstrained) and rigid (overconstrained) regions 03). 
FIRST also calculates the energies for all the bonds in the protein network. 

The output from FIRST can be used to produce a flexibility index Fi for each 
amino acid When Fi < 0, the amino acid is overconstrained, and therefore 
rigid; when Fi > the amino acid is floppy (underconstrained) . We then group 
adjacent residues with the same binary rigidity into rigid and floppy domains. 
As shown in Fig. [2 a protein typically consists of long, rigid domains separated 
by short, floppy hinge segments. It is important to point out that because 
graph rigidity is a nonlinear property, the rigidity of a protein may change as 
the aggregation proceeds, even though none of the atoms have moved relative to 
its neighbors. When two proteins bind, new constraints are added to the graph, 
usually leading to a more rigid network (see Fig.|3J). The procedure outlined in 
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this section amounts to a significant coarse-graining of the model: it starts from 
a full description (PDB file) with several thousand atomic coordinates for each 
protein and it outputs a representation consisting of a few rigid blocks (on the 
order of a few tens per monomer). It is this reduced representation (illustrated 
in Fig. |2J that we use to implement the stochastic kinetics of self-assembly. 



Stochastic kinetics of capsid assembly 

Studying the time evolution of the assembly process by integrating the equa- 
tions of motion is computationally infeasible even for reduced representations 
like those described above. There are two main obstacles for the implementation 
of a fully dynamical approach: first, the combinatorial explosion of the num- 
ber of intermediates for large aggregates of proteins — a problem that cannot be 
overcome by sheer computational power and that must be addressed at the mod- 
elling stage; second, the lack of tested and rigorous coarse-grained potentials for 
explicit dynamics of reduced protein models, especially when diffusion plays a 
significant role. To circumvent these problems, we consider instead the stochas- 
tic kinetics of the assembly process through an extended version of Gillespie's 
stochastic algorithm in which we consider dissociation and association events 
modulated by diffusion. 

Gillespie's classic algorithm jjjl l3lh was introduced in 1976 as a computa- 
tional tool for the stochastic simulation of chemical reactions. Recently, Gille- 
spie's algorithm has had a vigorous revival due to its relevance to many biological 
systems, where only small numbers of molecules are present. The theoretical 
basis for a stochastic formulation of chemical reactions is the chemical mas- 
ter equation which describes the probability that a given event (or no event) 
takes place over an infinitesimal time interval l|32f) . Unfortunately, the master 
equation is not solvable explicitly for systems involving more than a few dif- 
ferent molecules and reactions. Gillespie's algorithm addresses this numerically 
and provides an exact procedure for a Monte Carlo simulation of a system of 
reacting molecules. As is obvious in Fig. 0] the complexity of the pathways 
increases combinatorially with the size of the oligomers. The propensity of each 
reaction is a product of a combinatorial factor, dependent on the number of 
reactant molecules available for the reaction, and a rate constant, dependent 
on properties (such as size, velocity and mass) of the molecules involved in the 
reaction ijsij). 



Association events 

During capsid assembly, there are association and dissociation events. The asso- 
ciation events are elementary (bi- molecular) 'reactions' in which two oligomers 
collide to form a new complex. The association process of structured molecules 
in solution can be modelled as a succession of two independent processes: first, 
two oligomers must meet through a diffusive process; next, they must over- 
come a barrier in order to aggregate and reach the final bound state ijsiL Isfih . 
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In its standard form, the Gillespie algorithm assumes that the reactants are 
dilute, perfectly-mixed, structure-less molecules in vacuum. This is obviously 
not a good approximation in our case, and we have extended the algorithm to 
take into account diffusion, the influence of water, and geometric and entropic 
factors. Our approach is simpler than the explicit stochastic simulation of the 
spatio-temporal reaction-diffusion process using computationally intensive voxel 
models i|36l 1371) yet it captures the relevant physical features. 

To account for the diffusive rate, we use concepts from Smoluchowski's the- 
ory of rapid coagulation In its simplest form, this theory was developed 
for spherical colloidal particles and hence needs to be corrected when applied 
to protein aggregates with specific geometry and binding sites 

mum. 

It can 

be shown that the modified Smoluchowski rate is: 

fcff ° c = = (^DijRijmnj) k, (1) 

where k\^ is the Smoluchowski diffusive rate for hard spheres. Here, nt and 
rij are the unit concentrations of particle types i and j, and the diffusivity 
DijRij = Diri(r~ 1 +rJ 1 )(r i +rj) is related to D\ andri, the diffusion coefficient 
and radius of the monomer, and to and rj, the radii of particle types i and j. 
Based on a simple scaling geometric argument valid for disc-shaped oligomers, 
it can be assumed that the radius increases as the square root of the number of 
monomers. The dimensionless parameter k is a form factor, which reflects the 
probability that a collision between two oligomers will result in the formation of 
a complex. It accounts for the fact that the proteins will attach at a lower rate 
than homogenously 'sticky' particles due to their geometry and specific binding 
sites. It can also be interpreted as a generic entropic barrier that needs to be 
surmounted for association 

The aggregation of oligomers can occur in a number of different ways with 
different association energies E^' for the specific pairings (see Table QJ. When 
forming a new oligomer we assume the proteins to be at the positions that they 
attain in the complete capsid. This means that our model does not account 
for malformed capsids. Neither does it include the maturation or conforma- 
tional changes that are known to occur in some viruses. To model the fact that 
oligomers with large negative association energies are more likely to be formed, 
we multiply the rate in Eq.^by a Boltzmann factor exp(wE^ /ksT), where ks 

is the Boltzmann constant, T is the temperature and E^' is the association en- 
ergy. The association energy is modified to include the effect of water shielding. 
Because the protein is surrounded by water, the effective energy of inter-protein 
hydrogen bonds is reduced, since there exists the alternative of forming bonds 
with water molecules instead. It is important to note that both the form fac- 
tor k and the water shielding factor w can be estimated from experiments or 
molecular dynamics simulations (0,0,0)- 

The energies can be obtained from different sources but, for simplicity, we 
have used throughout this paper the energies as calculated by FIRST. We re- 
mark, however, that our algorithm is modular and more sophisticated energy 
calculations could be easily incorporated into our computational framework, 



Stochastic kinetics of virus assembly 



7 



e.g., CHARMM energies from VIPER HHH). 

For completeness, we have car- 
ried out a comparison between the association energies from FIRST and VIPER. 
We have checked that although the energies can differ significantly in absolute 
numbers (as shown in Tabled, both the ordering of the bond strengths and the 
localization of the bonds are broadly consistent between FIRST and VIPER. 



Dissociation events 

In addition to aggregating, oligomers can also break up into smaller units with 
a dissociation rate which is an indication of the longevity of an oligomer. The 
propensity of a dissociation event depends on the energy required to break the 
bonds that hold the oligomer together, but is also related to the redistribution 
of energy into the internal modes of the oligomer. It is in this context that our 
reduced description of protein oligomers becomes most helpful. 

We base our modelling of the dissociation process on transition-state theory 
as applied to the escape from a multi-dimensional well. In this framework, the 
escape rate from a well with N vibrational degrees of freedom is given by (|43) 

1 A (0) / x 

fcdiSS ° C = 2^7^\w ex P (-E (b) /k B T) , (2) 

where are the eigenfrequencies at the bottom of the well and Ap are the 
eigenfrequencies when the particle is at the point of escape (i.e., at the top of 
the barrier of height E^ b '). The generic Eq. |3can be related to the reduced 
protein model introduced in the previous section. If we view the oligomer as a 
harmonic network, where each domain is treated as a point mass and the bonds 
connecting domains as linear springs, then the original oligomer represents a 
local minimum in the energy landscape and escape from this well represents the 
physical process of splitting the oligomer. 

The eigenfrequencies of the system at equilibrium, \\ \ are obtained by 
diagonalizing the system Mi + Kx = 0, where M is the diagonal matrix of 
domain masses and K is the weighted Laplacian matrix of the graph. Each 
weight Kij is the stiffness of the bond connecting domain i and j obtained from 
Hooke's law, = 2E i j/x 2 i ^ with By being the energy of the bond and Xij the 
equilibrium distance of the bond. The diagonal elements of the stiffness matrix, 
Kn, are given by the condition that the sum of the elements in each row is 
zero. In our reduced network, there are two types of bonds to be included in 
the analysis: hydrogen bonds and covalent bonds. The energies are provided by 
FIRST: hydrogen bonds are of the order of —5 kcal/mol once they have been 
multiplied by the water shielding pre- factor w, and we assume the covalent 
bonds to be —74 kcal/mol, a value close to the typical energies of C-N and C-C 
bonds. (Note that this energy is not multiplied by w since there is no option for 
the covalent bonds to form bonds with the surrounding water molecules.) If two 
domains are linked by both hydrogen bonds and covalent bonds, only covalent 
bonds are considered since they are an order of magnitude stronger. 
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To obtain the dissociation rate in Eq. [21 for a given split, we first calculate 
the eigenfrequencies of the original system A^ via the generalized eigenvalue 
problem of the unmodified network. The eigenvalues of the system at the bar- 
rier, \f \ and the barrier height, E^ h \ are obtained by examining the possible 
partitions of the graph. A given partition is characterized by the minimal set 
of edges that is required to split it into two subgraphs. The total energy of 
the removed edges is equal to and the A^ 6 ' are obtained as the generalized 
eigenvalues of the partitioned graph. Indeed, when the graph is partitioned one 
eigenvalue becomes zero, which explains why the numerator and denominator 
do not run over the same indices. For most oligomers, the most favorable splits 
have an eigenmode ratio of about 10~ 2 , although this ratio can be up to five 
orders of magnitude larger in some cases. Similarly to the association events, 
each split is then considered within our Gillespie algorithm as an independent 
event with its own characteristic propensity with rate fc dlssoc given by Eq. |2J 

Clearly, there are many ways of splitting an oligomer. For example, the 
trimcr in Fig. can split in three different ways with different propensities. 
Since the total number of possible splits grows combinatorially with the num- 
ber of domains in the oligomer, we have reduced the complexity by imposing 
two constraints on the partitions: only hydrogen bonds are allowed to break, 
and only bi-partitions,(i.e., splits into two fragments) are considered. The latter 
is not an extreme assumption as splits resulting in more fragments can be com- 
posed of a number of subsequent bi-partitions. Under these restrictions, and 
due to the sparsity of the inter-monomer connections in the icosahedral lattice, 
the number of partitions grows sub-exponentially, the exact rate depending on 
the topology of the oligomer. The eigenvalue calculation for the dissociation 
events is the most time consuming step in our simulations. To speed up the 
calculations, we have devised a data structure that stores the results of known 
events for use in subsequent runs. 

Results of the simulations 

Our modelling framework is generic and can be applied to any icosahedral virus. 
In this section, we illustrate the output of the current version of the program 
with two small plant viruses: the T — 1 virus Satellite Panicum Mosaic Virus 
(SPMV, PDB code: lstm) and the T = 3 virus Southern Bean Mosaic Virus 
(SBMV, PDB code: 4sbv). Interestingly, SBMV is known to be capable of self- 
assembly in vitro (|44r ) whereas SPMV is not. Before we present some numerics, 
we make two technical points regarding the simulations. 

One advantage of our computational model is that it has relatively few, phys- 
ically meaningful parameters. Table |2] presents a summary of the parameters: 
(i) the temperature and concentration; (ii) the average radius and diffusion co- 
efficient of a monomer in order to calculate the diffusion rate in Eq. ^ and (iii) 
the bond constants used to derive the eigenfrequencies for the dissociation rate 
in Eq-EJ All these quantities are directly related to physical variables. There are 
three additional parameters that, although physically motivated, are of a more 
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conceptual nature. First, E cut is an input parameter for the software FIRST 
that specifies the cut-off energy for a hydrogen bond to exist. This can be loosely 
related to temperature and under standard conditions it is approximately —0.7 
kcal/mol (H3). Second, the fact that proteins are surrounded by water means 
that the effective strength of the hydrogen bonds is reduced by a factor w which 
has been estimated to be 10-25% using detailed MD calculations (|4lh. This 
parameter is related to pH and to the ionic strength of the solution. Third, 
the form factor k used in the modified Smoluchowski equation (Eq. ^| has been 
estimated to be in the range 10~ 3 — 10 -5 through computer simulations of dif- 
fusing proteins (0113). This parameter is related to the specific geometry and 
docking of the oligomers. 

The second technical point refers to size limitations in the software used. 
Our current implementation uses version 3.1 of the software FIRST, which is 
limited to analyzing protein structures with a maximum of 75,000 atoms 
This limitation is not intrinsic to the method (only to version 3.1) and future 
releases will extend its capabilities. This effectively means that at present we do 
not investigate dissociation paths for oligomers larger than 20 proteins even if 
the computations are fast. Therefore, our full simulations (including both asso- 
ciation and dissociation propensities) are run up to the formation of oligomers 
of size 20. However, we will also present simulations of the completion of the 
full capsid obtained from runs with association paths alone, which do not rely 
on the use of FIRST. 

The starting point for the simulations is a state where all proteins are present 
as monomers. The system then evolves towards aggregation into larger units. 
There is an initial transient during which a large amount of reactions involv- 
ing monomers take place. Relatively quickly, the concentration of a few key 
oligomers builds up and the system then settles into a quasi-steady state in 
which the concentration of the different oligomers remains relatively constant — 
except, of course, for monomers and 'completed' capsids (size larger than 20). 
Effectively, monomers are transformed into capsids via restricted pathways that 
do not alter significantly the average concentrations of the intermediates. We 
explore and characterize this cascading process in what follows. 

The quasi-steady solution 

We first illustrate some of the results through the analysis of the quasi-steady 
state in the assembly of the T = 1 SPMV virus (lstm). Each simulation starts 
with 1,000 monomers. To eliminate the transient, we do not collect statistics 
until the first oligomer of size larger than 20 is formed. At this point, we consider 
the system past the transient state, we remove the large oligomer and we record 
the time- weighted concentration average of all oligomers until the next > 20-mer 
is formed. We repeat this procedure 1, 000 times and average the results, which 
are presented in Fig. El It is important to note that we have also run simulations 
where starting from an empty system, we add monomers at a constant rate and 
remove oligomers larger than size 20. Once this open system reaches a quasi- 
steady state, we have checked that it behaves in the same way on average as 
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the one starting from a fixed number of monomers. 

Inspecting the simulations, we find that there are very few distinct oligomers 
with a significant presence throughout the process (Fig. [SJ. Only monomers, 
dimers and hexamers are present in any significant concentration during the 
formation of the capsid. The concentrations of all other oligo mers are negligible. 
A similar conclusion was also reached by Endres et al. (25). Interestingly, this 
is not just the result of the difference in the association energies; Fig. [5] shows 
that all oligomers have similar association energies (per monomer). 

The simulation data also yield information about the processes governing 
the kinetics of the system. Oligomers larger than hexamers are quite rare and 
as soon as one is created it tends to participate in a series of rapid reactions 
leading to a > 20-oligomer. This cascading behavior emerges because large 
intermediates tend to follow from favorable association energies and also tend to 
be stable with respect to dissociation. This view of the assembly as a cascading 
process is in good agreement with other dynamical simulations ffTll Hfil) . 

Oligomers with a significant concentration (monomers, dimers and hexam- 
ers) tend to fluctuate around a mean value. On the other hand, oligomers with 
negligible concentrations are not present most of the time and they react and 
disappear quickly when present. These two types of behavior are illustrated 
in Fig. El where we plot the average probability distributions of the concentra- 
tion of dimers, tetramers and hexamers at quasi-steady state. The distribution 
of dimers is 'Gaussian-like' around a high concentration while tetramers show 
the characteristics of a 'Poisson-like' distribution. Hexamers display less clear 
features. Indeed, although all the underlying elementary stochastic processes of 
aggregation and dissociation are Markovian, the structure of the kinetic network 
leads to a variety of quasi-steady distributions for the different intermediates. In 
the Discussion, we provide a simple theoretical argument of how these distinct 
behaviors emerge. 

The simulations of the assembly of lstm can be used to extract further 
details about the pathways in use in the network of reactions. To make this 
more explicit, we calculate the average transition probability of association and 
dissociation events as derived from the numerics. These probabilities form a 
transition matrix, which we present in Fig.[71i as a heat map. The upper trian- 
gular section of the matrix corresponds to association processes while the lower 
triangular section corresponds to dissociation reactions. Note that virtually all 
the dissociation events are confined to the small oligomers. One of the reasons 
that small oligomers are easier to split is that they have fewer inter-monomer 
bonds per protein. This can be understood from Eq. [3 where the Boltzmann 
factor has a large impact on the dissociation rate. 

Using these data, we show in Fig. |3> ; c that the assembly proceeds via 
a few pathways which thread through the combinatorially complex associa- 
tion/dissociation tree shown in Fig.0] These reactions lead to significant quasi- 
steady concentrations only for monomers, dimers, hexamers, 10-mers, 16-mers 
and 20-mers. A mere inspection of the binding energies prior to the simulations, 
would not lead to this outcome although it can be understood a posteriori in 
terms of the properties of the oligomers. For instance, almost all the lstm dimers 
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formed correspond to the dimer circled in Fig. [7^, which has a bond with 2- 
fold icosahedral symmetry in the full capsid (see Fig. and Table Since 
the other bonds involved in dimers are significantly weaker there will hardly be 
any other dimers present. The predominance of this dimer has consequences: 
the dominant hexamer can in turn be viewed as a combination of three of the 
2-fold symmetrical dimers bound by the weaker 3-fold symmetrical bonds. One 
of the conclusions of the stochastic simulations is that predicting the prevalent 
intermediaries cannot be based on energetic considerations alone. It is possible 
that stable and favorable intermediates, as determined by the static analysis, 
are never reached because the necessary kinetic steps are not accessible. 

A key idea behind our method is to study how chemical properties at the 
molecular level (as recorded in the protein atomic structure) lead to differences 
in the assembly path. To illustrate how our computational framework can help 
explore these connections, we analyze the assembly of the T — 3 virus SBMV 
(4sbv) in direct comparison to the results obtained above for the T = 1 virus 
SPMV (lstm): Fig.[S]shows the quasi-steady time-averaged concentrations and 
association energies for all the oligomer sizes, while Fig.[5]presents the heat map 
of transitions and the relevant pathways of assembly. The results are averaged 
over 1,000 runs of the quasi-steady formation of a > 20-mer, as before. 

The average concentrations, association energies and heat maps of 4sbv re- 
veal that trimers, hexamers, 9-mers (and, in general, all multiples of three) have 
local maxima in the concentration plot and corresponding local minima in the 
association energy plot. This is also visible in the heat map as a 'checkerboard 
pattern.' In this case, and contrary to lstm, trimers are the effective units in 
the assembly of 4sbv, in agreement with Reddy et al. l(l^ and expected not only 
for reasons of symmetry, but also from considering the bond energies. Interest- 
ingly, Reddy et al. (0) conjecture that the symmetric 15-mer will be the most 
stable oligomer. Although the analysis with FIRST indicates that this oligomer 
is favorable both in terms of association energies and of dissociation propensity, 
we find no evidence of significantly higher concentration than for other large 
oligomers. This could mean that although stable, this oligomer might not be 
kinetically easy to access. However, it is also possible that this is a result of our 
evaluation of the energies with FIRST, as opposed to the use of energies from 
VIPER. 

Comparing the quasi-steady concentrations of lstm and 4sbv in Figs.0and|Hl 
it is clear that the concentration of monomers is significantly higher for 4sbv. 
Moreover, from the heat map (Fig.^J it is evident that there are more reactions 
involving large oligomers. The cascading behavior is therefore less pronounced 
for 4sbv than for lstm as it is less rare to find two large oligomers present at the 
same time in the 'solution.' This behavior stems from the fact that the bonds 
in the symmetric 4sbv trimer are significantly stronger than the bonds linking 
the trimers. Thus, it is less favorable for a large 4sbv oligomer to react than it 
is for a large lstm oligomer. This is also reflected in the heat map: since large 
oligomers react more slowly, there will be more dissociation events (i.e., shaded 
squares below the diagonal in Fig. ^jp.) involving large oligomers for 4sbv. 
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The formation of the full capsid 

Up to now, we have focussed on the properties of the quasi-steady state, where 
we assume that the supply of monomers is constant and the cascading process of 
assembly leads to a stable output of capsids. We will now examine the kinetics 
of formation of a full icosahedral capsid from a finite amount of monomers. 

As explained above, our dissociation calculations have an upper limit of 
20-mers, due to the use of version 3.1 of the software FIRST. However, the 
cascading behavior described above for the lstm virus implies that once large 
oligomers are formed it is unlikely that they will split and thus the dissociation 
paths might be ignored without much change in the observed behavior. We have 
explored this idea in more detail by studying the sensitivity of the stochastic ki- 
netics to the form factor k, which modulates the balance between the association 
and dissociation pathways. Increasing k increases all association rates, which 
implies that the dissociation events will become less likely. Fig. shows the 
ratio between the number of dissociation events and the total number of events 
in the assembly of lstm as a function of k. For low k there are almost as many 
dissociation as association events, and the assembly proceeds very slowly or not 
at all. In this regime, a dimer will almost immediately be broken up once it has 
formed and the assembly process is never able to get started. As n increases, 
there is a relatively sharp drop in the number of dissociation events. Eventually, 
the number of dissociation events becomes negligible and the assembly process 
proceeds almost exclusively by association. 

Fig. llll shows the quasi-steady concentrations and the reaction pathways that 
appear in the assembly of lstm for different values of the form factor k. The key 
feature of these simulations, however, is that the same types of reactions occur 
for all values of k; that is, the main pathways remain unchanged even if there arc 
many dissociation events. Under the current setup for this virus, dissociation 
appears to slow the progress of aggregation by splitting small oligomers but it 
does not prompt the assembly to proceed through alternative pathways. 

A direct consequence of the particular kinetics of lstm is that forward (as- 
sociation) reactions are qualitatively similar for a range of values of k. If the 
value of k is relatively high, the rare dissociation events can be neglected. We 
can then run simulations with association paths alone (no longer capped by the 
size limit in FIRST) that lead to the explicit formation of complete capsids. 
In Fig. 112b . b, we show the concentration of the oligomers over time after the 
transient has been removed for k = 10~ 3 . As expected, only monomers, dimers, 
hexamers and full capsids have any significant presence, while all other interme- 
diate oligomers do not appear in any persistent way. We also show in Fig. 112b 
that the rate of capsid formation saturates as the concentration of monomers 
decreases. The overall shape of this curve is in good agreement with experiments 
and other theoretical models iflfil Ht^. If we consider the almost linear section 
at the outset, we can derive an approximate capsid formation rate of 1.5 • 1CU 4 
M s _1 , which is of the same order of magnitude as the model by Endres and 
Zlotnick JH). 

As a final comment, it is interesting to note that 4sbv (SBMV) can form 
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capsids at significantly lower k values than lstm (SPMV), as seen in Fig. ITU1 
This conclusion cannot be drawn easily from the association energies alone: the 
most favorable lstm dimer and 4sbv dimer have association energies of —48 
and —38 kcal/mol, respectively. However, when a 4sbv dimer is formed, a 
favorable reaction to form a symmetric trimer tends to follow immediately. On 
the other hand, despite their higher binding energy, the lstm dimers have no 
such favorable aggregation pathway to form a stable large oligomer. 

Discussion and conclusions 

Understanding the quasi-steady solution as a Markov pro- 
cess 

Our Gillespie simulation of the kinetics of the network has shown that although 
the full assembly tree (Fig. [2J is extremely complex, only a few pathways are 
crucial for the assembly. In other words, our extended Gillespie algorithm pro- 
vides us with a stochastic sampling of the reaction network, unknown a priori, 
which leads to an estimate of the transition probabilities in the system. 

We can use the estimated transition matrix (represented in Figs.[7|3 and HJz 
as heat maps), to investigate the description of the reaction network as a non- 
homogeneous Markov process. To check the consistency of the quasi-steady 
solution obtained numerically in Figs. and |H1 we apply the results of Darroch 
and Seneta {is! ) for quasi-stationary Markov processes taking the stoichiometry 
into consideration l|4q) . The system is only gMasi-stationary because ultimately 
there is an absorbing state where all monomers are part of completed capsids. 
However, in the transient state the quasi-stationary distribution (QSD) can be 
calculated as the fixed point tt* of the following equation: 

>00 = ^=> ** T = KO, (3) 

where tt is the distribution of concentrations, e is the vector of ones, and Q 
is the transition rate matrix as derived from the simulations f47j) . There are 
two interpretations of the QSD (45): it can be viewed as a conditional sta- 
tionary distribution (i.e., the stationary distribution provided that the Markov 
process is in the transient), or as the expected time spent in each state divided 
by the total time to absorption. Fig. 03 shows that the QSD tt* is close to 
the average empirical distribution tt from the simulations. The transition rate 
matrix derived using the stochastic sampling is therefore consistent with the 
observed quasi-steady distribution under the assumption of a non-homogeneous 
semi-Markov process. This description also provides us insight into why some 
oligomers have a Gaussian-like distribution while others present Poisson-like 
features. In a system where the total number of monomers is fixed, the (quasi) 
stationary distribution will be multinomial ji^) . In the limit of large N and 
small TTi, the distribution of oligomer i can be approximated by a Poisson dis- 
tribution. For large N and intermediate TTi, the Gaussian distribution is a good 
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approximation. 

Explaining the probabilistic features of the cascading be- 
havior 

In the cascading process, a few oligomers are relatively long-lived while all other 
oligomers survive for only very short times before reacting. The existence of 
Gaussian-like and Poisson-like distributions is related to this cascading process 
and can be understood through a simplified kinetic model. Consider a toy model 
of the early stages of aggregation of lstm consisting of three oligomers (M2, M4 
and M 6 ), which can be thought of as analogues of the dimers, tetramers and 
hexamers, respectively: 



2M 2 
M 2 + M A 
M 6 

The first and last reactions correspond to creation from a source and decay to 
a sink and there are two reaction rates kn 3> fct- We simulate this system 
using the Gillespie algorithm. The resulting stationary distributions, shown in 
Fig. El present similar characteristics to those discussed in the lstm assembly 
process (see Fig.EJ). This can be understood as follows: the creation rate of the 
'dimers' M 2 is much higher than the rate at which they are consumed, leading to 
a Gaussian steady state, as predicted by the linear noise approximation of van 
Kampen l|32t l49f) . On the other hand, the 'tetramers' M4 have a low creation 
rate and there are always 'dimers' available with which they can react at a 
high rate. This leads to a Poisson-like distribution for the 'tetramers.' Finally, 
'tetramers' disappear quickly to create 'hexamers' Mq, which decay at a very 
low rate and thus have a Gaussian-like distribution. 

Summary and future work 

This paper presents a modular framework for the study of the stochastic kinet- 
ics of viral capsid assembly. The calculations are based on structural crystallo- 
graphic protein data and use rigidity analysis to produce a reduced mechanical 
description of the protein oligomers. Rates for association and dissociation re- 
actions based on the protein descriptions are then used within an extended 
Gillespie algorithm to explore the kinetics of capsid assembly. 

Because of its biophysical motivation, our model has relatively few param- 
eters and most of them are directly related to physical variables: temperature, 
concentration, diffusion coefficients, protein radius, covalent bond energies, and 
bond lengths. We have checked the dependence of our simulations on these 
physical variables. For instance, if the temperature is increased, dissociation 
events will become more likely and the overall rate of assembly will drop. In 



KH 


M 2 


k L 


M 4 


k H ; 


M 6 


k L> 


0. 



(4) 
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addition, the relative difference in association energies between oligomers de- 
creases. This means that the population of oligomers will become more varied 
and more reaction pathways will become important; that is, as the temperature 
increases, the assembly tree will be explored more homogeneously. Similarly, 
lowering the concentration decreases the association rate. If the concentration 
is too low, the dissociation events become prevalent and there will be no assem- 
bly. However, the characteristics of the assembly pathways are unchanged by 
concentration. 

There are three additional parameters (E cut , k and w) that have physical 
meaning and motivation, but are not easily related to a single physical variable. 
We have discussed in depth the effect of the form factor k in the preceding 
sections. In addition, we have checked that the results of our analysis do not 
depend qualitatively on the cut-off energy E cut or the water shielding constant 
w. Increasing the cut-off energy for hydrogen bonds in FIRST reduces the num- 
ber of hydrogen bonds in the system. This produces the same overall effect as 
increasing the temperature since all energies in the system are lowered. It also 
leads to floppier proteins with a higher eigenratio in Eq.[21and thus more dissoci- 
ation events. Increasing the water shielding w means stronger hydrogen bonds, 
which is equivalent to lowering the temperature. The assembly will thus pro- 
ceed along low energy pathways, with a small variety of oligomers and a reduced 
number of dissociation events. This discussion indicates that changes in both 
E cut and w can be qualitatively understood as an effective change of 'tempera- 
ture.' Note however that the effect of the form factor n is different. Physically, 
the increase of k is equivalent to lowering the barrier for two oligomers to form 
a larger oligomer with no influence on the dissociation process. Therefore, the 
likelihood of the dissociation events is reduced and the assembly is sped up. 

A key feature of the proposed framework is that it is both modular and ex- 
tensible, i.e., the algorithms that make up the different components of the model 
can be exchanged seamlessly at different levels. A number of refinements to the 
model should be pursued to improve the oversimplifications of this initial work. 
Indeed, the bond energies could be calculated more precisely using more de- 
tailed potentials. This can have far-reaching implications for the pathways and 
intermediates deduced from the simulations and a variety of energy calculations 
should be explored carefully when dealing with specific viruses (50). A key in- 
gredient of the protein model is the derivation of a reduced representation from 
the full PDB data. In this work, we have used FIRST for protein partitioning 
as a conceptual tool based on ideas from graph rigidity. However, one could use 
methods based on normal modes (full atom, elastic or gaussian models) or prin- 
cipal component analysis to obtain coarse-grained representations of proteins. 
Another important set of refinements should concentrate on the description of 
the association process. In particular, a more sophisticated model of the pro- 
tein docking, including its entropic aspects, would be necessary to improve the 
physical realism of the form factor n. Moreover, it would be important to refine 
the association rates to parallel more closely the kinetics of chemical assembly. 
The dissociation model itself could also be improved by taking explicitly into 
account entropic features and incorporating the geometric content of the graph 
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when computing the cigcnfrcqucncics. Finally, it would be important (although 
non-trivial) to extend our model to allow for non-icosahedral symmetries and 
for malformed capsids jlol lTlh. 

In summary, our work introduces a description of viral capsid formation as 
a stochastic assembly of protein oligomers. An important aspect is that our 
framework is data-driven, starting from molecular detail, and exhibits different 
assembly behaviors for different viruses, as exemplified by the results for lstm 
and 4sbv presented here. Importantly, no assumptions are made about specific 
intermediates through which the assembly has to proceed — all such phenomena 
emerge from the data. Our methodology bridges the gap between the static 
and dynamic models of viral assembly by using a stochastic sampling algorithm 
to investigate the assembly pathways. The sampling is done using an extended 
version of the Gillespie algorithm which is derived from fundamental physi- 
cal principles. This enables a mesoscopic simulation of the kinetics which is less 
computationally intensive than a microscopic MD simulation. Alternatively, this 
algorithm provides a physically based sampling of the assembly tree, as opposed 
to computationally intractable combinatorial optimization techniques l)l2l Il3j) . 
We are currently in the process of extending and refining our framework in 
several of the above directions as we pursue a general exploration of other icosa- 
hedral viruses in different families. 
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Tables 



Interface 


Symmetry 


E\f VIPER 
(kcal/mol) 


FIRST 

(kcal/mol) 






1-6 


Quasi 5 fold 


-21.0 


-9.0 


1-38 


Quasi 5-fold 


-21.0 


-9.0 


1-37 


Quasi 2-fold 


-29.0 


-48.7 


1-2 


Quasi 3-fold 


-33.0 


-24.7 


1-3 


Quasi 3-fold 


-33.0 


-24.7 



Table 1: Association energies from FIRST and VIPER. A comparison of 
the association energies for the Satellite Panicum Mosaic Virus (SPMV, PDB 
code lstm) computed using VIPER JH) and with FIRST with E cut = -0.7 
kcal/mol. The interfaces are shown in Fig. [Tfa. Note that in the simulations 
these energies are multiplied by the water shielding factor w— 0.17 to account 
for protein hydration. 



Parameter 


Value 


Units 


Temperature, T 


300 


K 


Initial monomer concentration, C 


5 


uM 


Monomer diffusion coefficient, D\ 


0.1 


nm 2 /s 


Monomer radius, r\ 


1 


nm 


Covalent bond strength 


-74 


kcal/mol 


Covalent bond length 


1.5 


A 


Hydrogen bond length 


3 


A 


H-bond effective strength, w 


0.17 




FIRST cut-off energy, E cut 


-0.7 


kcal/mol 



Table 2: Parameter values for the simulations. Besides those in the table, 
there is an additional parameter in the simulations: the form factor k which is 
initially chosen to be 10 -4 for lstm and 2 ■ 10~ 5 for 4sbv. The dependence of 
the results on k is shown in Figs. 1101 and II II 
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Figure Legends 

Figure ^ 

The icosahedral geometry of a T = 1 capsid. (a) There are 60 symmetri- 
cally equivalent lattice positions, each one occupied by an asymmetric protein. 
For the lstm virus, the protein positioned at 1 has bonds with those positioned 
at 2, 3, 6, 38 and 37 (but not with 4), with energies as shown in Tabled (b) A 
flattened view of the icosahedron above. 

Figure El 

Coarse-grained description of the lstm coat protein, (a) A view of the 
lstm monomer created using RasMol |5lf) . The protein consists of more than 
4,000 atoms in 157 amino acids, (b) The backbone of the lstm protein. The color 
scale represents rigidity as determined by the software FIRST with E cut = —0.7 
kcal/mol: Blue means rigid and red means floppy while the other colors represent 
intermediate rigidities <|27h . Adjacent amino acids with equal binary rigidity are 
grouped into rigid or floppy domains, leading to the typical pattern in capsid 
proteins: floppy ends and rigid domains separated by short floppy domains at 
the center, (c) Schematic representation of the domain structure in (b). Here, 
rigid domains are drawn as rectangular vertices and floppy domains are drawn 
as ellipses. The numbers represent the number of amino acids in each domain. 
The thick lines represent the covalent bonds in the backbone while the thin 
lines are hydrogen bonds. Note that the domain structure is only needed for 
the dissociation rates — the association rates can be computed directly from the 
bond energies. 

Figure Q2 

Change of rigidity as the assembly proceeds, (a) In T — 1 viruses each 
cell in the icosahedral lattice (Fig. ^) accommodates one protein: here we show 
a lstm monomer in its reduced description, as in Fig[5Jc). As the assembly 
proceeds, the rigidity of every oligomer is recalculated with FIRST from the 
full atomic data. The existence of inter-molecular hydrogen bonds in the lstm 
dimer (b) and trimer (c) modifies the rigidity of the constitutive monomeric 
units. Monomers with the same color have identical rigidity profile and domain 
partitioning. 

Figure |4] 

The combinatorial assembly tree of lstm. A tree representing all the pos- 
sible monomers, dimers and trimers in the assembly of lstm. The thin lines are 
possible assembly paths by the addition of one monomer at a time. The thick 
lines represent the pathways that are actually observed in the simulations (com- 
pare with Fig. EJ ■ The number of possible pathways explodes combinatorially 
as the size of the oligomer grows. 
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Figure 151 

Simulations of the quasi-stationary state of lstm lumped by oligomer 
size, (a) Average time- weighted concentration of the different oligomer sizes 
obtained in the Gillespie simulation (crosses). The solid line is a guide to the eye. 
Note the high concentration of dimers and hexamers. The circles (and dashed 
line) show the prediction from the quasi-stationary Markov process (Eq. |3J), 
which shows good agreement with the simulation, (b) Average time- weighted 
association energy per monomer of the different oligomer sizes. The error bars 
are hardly visible indicating that all different conformations of a given size have 
almost identical association energies. Hexamers lie at a local minimum, a clear 
signal that they are stable oligomers, relatively more favorable than heptamers 
and octamers. Note the missing data points for size 11, as no 11-mers are 
observed in the simulations. 

Figure [6] 

Different concentration distributions in the simulations of the lstm 
assembly, (a) Most oligomers are present at very low concentrations with 
Poisson-like distributions as exemplified by the lstm tetramers. This means 
that low concentration is connected with short persistence. (6) A few oligomers 
have significant concentrations at all times in the quasi-steady state, such as 
the lstm dimers and hexamers shown. Their distributions have Gaussian-like 
characteristics. Although there are three distinct lstm dimers, almost all dimers 
in the simulation are of the most energetically favorable type (the one circled 
in Fig.EJ). 

Figure [7] 

Stochastic sampling of the assembly pathways for lstm with k = 10 4 . 

(a) The heat map illustrates the observed frequency of the reactions in the as- 
sembly: white squares indicate no reactions involving the two sizes, while darker 
shades indicate many reactions of that type. Reactions above the diagonal rep- 
resent associations (Mi + Mj — ► Mfc, where k = i + j), while reactions below 
the diagonal correspond to dissociations (M& — ► Af< + Mj, with k — i + j). 
For example, the (1, 1) square in the top left corner represents a 'monomer plus 
monomer' association reaction, and the square to the right (1,2) is the associ- 
ation of monomer plus dimer. Meanwhile, the (2, 1) square represents dimers 
splitting into two monomers. (6) Using the heat map in (a) we represent the 
most common oligomers of size six or less and the transitions between them. 
Dashed arrows represent dissociation reactions. Only reactions with a frequency 
above a given threshold are represented. The thickness of the arrows is propor- 
tional to the logarithm of the frequency of the reaction, which means that the 
vast majority of the reactions involve forming or breaking up oligomers of size 
six or smaller. The majority of reactions in the system merge monomers to form 
dimers. Most hexamers are formed by merging three dimers. But there is also 
a second pathway where a trimer and a dimer form a pentamer that later adds 
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a monomer to complete the hexamer.(c) A lumped, schematic representation of 
the pathways in (6) (inside the dotted rectangle) showing also the higher steps 
of the cascade. 

Figure |8] 

Simulations of the quasi-stationary state of 4sbv lumped by oligomer 
size, (a) Average time- weighted concentration of the different oligomer sizes. 
There are peaks for trimers, hexamers and other multiples of three indicating 
that the trimer is an important building block in the assembly. Compare with 
lstm in Fig. 0(6) Average time- weighted association energy per monomer of 
the different oligomer sizes. The error bars are hardly visible because all dif- 
ferent conformations of a given size have almost identical association energies. 
Oligomers formed by multiples of three tend to occupy local minima of the en- 
ergy, a hint of their enhanced stability. Note the missing data points for sizes 
8, 14, 17, 19 and 20 since these oligomers are never observed in the simulations. 

Figure Q3 

Stochastic sampling of the assembly pathways for 4sbv with n — 2-10 -5 . 
(a) The heat map shows the frequency of the reactions taking place. Note how 
the pattern differs from that of lstm (Fig. and 1 1 1|> . There is a variety of large 
oligomers present at any given time in the system and the cascading behavior 
is less pronounced. The checkerboard pattern indicates that reactions involving 
multiple-trimer oligomers are the most common. (6) A schematic view of the 
most common reactions for 4sbv deduced from the heat map in (a). Again, note 
the differences with the assembly of lstm (Fig. EJ. 

Figure 1101 

Impact of the form factor k on the speed of assembly. The figure 
shows the average proportion of dissociation events for lstm (x) and 4sbv (o) 
calculated over 350 independent runs, with the standard deviation indicated 
by the error bars. As n increases (making the association more likely), the 
proportion of dissociation events decays from 0.5 towards zero. Therefore, k 
is directly related to the overall speed at which the assembly proceeds. If n is 
small, the cascading process is not initiated and the assembly stalls. Clearly, 
detailed models for the calculation of the form factor k would be of importance. 
Note the different k values at which both viruses would start to assemble, which 
is reminiscent of their proclivity to self-assembly in vitro. 

Figure 1111 

Impact of the form factor k on the pathways of assembly. Heat maps 
depicting the pattern of reactions for lstm with (a) k = 10~ 3 and (b) k = 10~ s . 
As compared to Fig. note that the overall cascading pattern of reactions 
and pathways remains broadly unchanged. In (a), larger k means that there 
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are fewer dissociation events, i.e., fewer dark squares below the diagonal, see 
Fig- El In (b), smaller k means more dissociation events, but the pattern of 
association for the larger oligomers is similar for all three lstm heat maps. 

Figure [TJ1 

Kinetics of the completion of the full capsid for lstm. (a) Average 
concentration of all oligomer sizes as a function of time. At t = all pro- 
teins are monomers and they rapidly react to form larger oligomers. The most 
conspicuous feature of the distribution is that there are very few oligomers of 
intermediate sizes. (6) Concentration of the most common oligomers: 1- (solid 
line), 2- (dashed line), 6- (x) and 60-mers (o) as a function of time. All other 
oligomers have negligible concentrations, (c) Time for the completion of SPMV 
capsids with k = 10 . For this value of k, the number of dissociation events is 
almost negligible, as shown in Fig. 1101 

Figure [TJ1 

Different oligomer distributions in a simple cascading system. The 

simple cascading reaction Eq. ^involves only three different types of oligomers: 
'dimers' (M2), 'tetramers' (M4) and 'hexamers' (Mg). In the quasi-steady state, 
'dimers' and 'hexamers' have Gaussian-like distributions while the distribution 
of 'tetramers' is Poisson-like (compare with Fig. EJ. 
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